Abstract:Early detection and accurate diagnosis are essential to improving patient outcomes. The use of convolutional neural networks (CNNs) for tumor detection has shown promise, but existing models often suffer from overparameterization, which limits their performance gains. In this study, fuzzy sigmoid convolution (FSC) is introduced along with two additional modules: top-of-the-funnel and middle-of-the-funnel. The proposed methodology significantly reduces the number of trainable parameters without compromising classification accuracy. A novel convolutional operator is central to this approach, effectively dilating the receptive field while preserving input data integrity. This enables efficient feature map reduction and enhances the model's tumor detection capability. In the FSC-based model, fuzzy sigmoid activation functions are incorporated within convolutional layers to improve feature extraction and classification. The inclusion of fuzzy logic into the architecture improves its adaptability and robustness. Extensive experiments on three benchmark datasets demonstrate the superior performance and efficiency of the proposed model. The FSC-based architecture achieved classification accuracies of 99.17%, 99.75%, and 99.89% on three different datasets. The model employs 100 times fewer parameters than large-scale transfer learning architectures, highlighting its computational efficiency and suitability for detecting brain tumors early. This research offers lightweight, high-performance deep-learning models for medical imaging applications.
Abstract:Recent advancements in lidar technology have led to improved point cloud resolution as well as the generation of 360 degrees, low-resolution images by encoding depth, reflectivity, or near-infrared light within each pixel. These images enable the application of deep learning (DL) approaches, originally developed for RGB images from cameras to lidar-only systems, eliminating other efforts, such as lidar-camera calibration. Compared with conventional RGB images, lidar imagery demonstrates greater robustness in adverse environmental conditions, such as low light and foggy weather. Moreover, the imaging capability addresses the challenges in environments where the geometric information in point clouds may be degraded, such as long corridors, and dense point clouds may be misleading, potentially leading to drift errors. Therefore, this paper proposes a novel framework that leverages DL-based colorization and super-resolution techniques on lidar imagery to extract reliable samples from lidar point clouds for odometry estimation. The enhanced lidar images, enriched with additional information, facilitate improved keypoint detection, which is subsequently employed for more effective point cloud downsampling. The proposed method enhances point cloud registration accuracy and mitigates mismatches arising from insufficient geometric information or misleading extra points. Experimental results indicate that our approach surpasses previous methods, achieving lower translation and rotation errors while using fewer points.
Abstract:Unprecedented agility and dexterous manipulation have been demonstrated with controllers based on deep reinforcement learning (RL), with a significant impact on legged and humanoid robots. Modern tooling and simulation platforms, such as NVIDIA Isaac Sim, have been enabling such advances. This article focuses on demonstrating the applications of Isaac in local planning and obstacle avoidance as one of the most fundamental ways in which a mobile robot interacts with its environments. Although there is extensive research on proprioception-based RL policies, the article highlights less standardized and reproducible approaches to exteroception. At the same time, the article aims to provide a base framework for end-to-end local navigation policies and how a custom robot can be trained in such simulation environment. We benchmark end-to-end policies with the state-of-the-art Nav2, navigation stack in Robot Operating System (ROS). We also cover the sim-to-real transfer process by demonstrating zero-shot transferability of policies trained in the Isaac simulator to real-world robots. This is further evidenced by the tests with different simulated robots, which show the generalization of the learned policy. Finally, the benchmarks demonstrate comparable performance to Nav2, opening the door to quick deployment of state-of-the-art end-to-end local planners for custom robot platforms, but importantly furthering the possibilities by expanding the state and action spaces or task definitions for more complex missions. Overall, with this article we introduce the most important steps, and aspects to consider, in deploying RL policies for local path planning and obstacle avoidance with Isaac Sim training, Gazebo testing, and ROS 2 for real-time inference in real robots. The code is available at https://github.com/sahars93/RL-Navigation.
Abstract:Federated learning (FL) has become one of the key methods for privacy-preserving collaborative learning, as it enables the transfer of models without requiring local data exchange. Within the FL framework, an aggregation algorithm is recognized as one of the most crucial components for ensuring the efficacy and security of the system. Existing average aggregation algorithms typically assume that all client-trained data holds equal value or that weights are based solely on the quantity of data contributed by each client. In contrast, alternative approaches involve training the model locally after aggregation to enhance adaptability. However, these approaches fundamentally ignore the inherent heterogeneity between different clients' data and the complexity of variations in data at the aggregation stage, which may lead to a suboptimal global model. To address these issues, this study proposes a novel dual-criterion weighted aggregation algorithm involving the quantity and quality of data from the client node. Specifically, we quantify the data used for training and perform multiple rounds of local model inference accuracy evaluation on a specialized dataset to assess the data quality of each client. These two factors are utilized as weights within the aggregation process, applied through a dynamically weighted summation of these two factors. This approach allows the algorithm to adaptively adjust the weights, ensuring that every client can contribute to the global model, regardless of their data's size or initial quality. Our experiments show that the proposed algorithm outperforms several existing state-of-the-art aggregation approaches on both a general-purpose open-source dataset, CIFAR-10, and a dataset specific to visual obstacle avoidance.
Abstract:Event cameras, inspired by biological vision, are asynchronous sensors that detect changes in brightness, offering notable advantages in environments characterized by high-speed motion, low lighting, or wide dynamic range. These distinctive properties render event cameras particularly effective for sensor fusion in robotics and computer vision, especially in enhancing traditional visual or LiDAR-inertial odometry. Conventional frame-based cameras suffer from limitations such as motion blur and drift, which can be mitigated by the continuous, low-latency data provided by event cameras. Similarly, LiDAR-based odometry encounters challenges related to the loss of geometric information in environments such as corridors. To address these limitations, unlike the existing event camera-related surveys, this paper presents a comprehensive overview of recent advancements in event-based sensor fusion for odometry applications particularly, investigating fusion strategies that incorporate frame-based cameras, inertial measurement units (IMUs), and LiDAR. The survey critically assesses the contributions of these fusion methods to improving odometry performance in complex environments, while highlighting key applications, and discussing the strengths, limitations, and unresolved challenges. Additionally, it offers insights into potential future research directions to advance event-based sensor fusion for next-generation odometry applications.
Abstract:In recent years, Light Detection and Ranging (LiDAR) technology, a critical sensor in robotics and autonomous systems, has seen significant advancements. These improvements include enhanced resolution of point clouds and the capability to provide 360{\deg} low-resolution images. These images encode various data such as depth, reflectivity, and near-infrared light within the pixels. However, an excessive density of points and conventional point cloud sampling can be counterproductive, particularly in applications such as LiDAR odometry, where misleading points and degraded geometry information may induce drift errors. Currently, extensive research efforts are being directed towards leveraging LiDAR-generated images to improve situational awareness. This paper presents a comprehensive review of current deep learning (DL) techniques, including colorization and super-resolution, which are traditionally utilized in conventional computer vision tasks. These techniques are applied to LiDAR-generated images and are analyzed qualitatively. Based on this analysis, we have developed a novel approach that selectively integrates the most suited colorization and super-resolution methods with LiDAR imagery to sample reliable points from the LiDAR point cloud. This approach aims to not only improve the accuracy of point cloud registration but also avoid mismatching caused by lacking geometry information, thereby augmenting the utility and precision of LiDAR systems in practical applications. In our evaluation, the proposed approach demonstrates superior performance compared to our previous work, achieving lower translation and rotation errors with a reduced number of points.
Abstract:Human pose estimation involves detecting and tracking the positions of various body parts using input data from sources such as images, videos, or motion and inertial sensors. This paper presents a novel approach to human pose estimation using machine learning algorithms to predict human posture and translate them into robot motion commands using ultra-wideband (UWB) nodes, as an alternative to motion sensors. The study utilizes five UWB sensors implemented on the human body to enable the classification of still poses and more robust posture recognition. This approach ensures effective posture recognition across a variety of subjects. These range measurements serve as input features for posture prediction models, which are implemented and compared for accuracy. For this purpose, machine learning algorithms including K-Nearest Neighbors (KNN), Support Vector Machine (SVM), and deep Multi-Layer Perceptron (MLP) neural network are employed and compared in predicting corresponding postures. We demonstrate the proposed approach for real-time control of different mobile/aerial robots with inference implemented in a ROS 2 node. Experimental results demonstrate the efficacy of the approach, showcasing successful prediction of human posture and corresponding robot movements with high accuracy.
Abstract:The increased data transmission and number of devices involved in communications among distributed systems make it challenging yet significantly necessary to have an efficient and reliable networking middleware. In robotics and autonomous systems, the wide application of ROS\,2 brings the possibility of utilizing various networking middlewares together with DDS in ROS\,2 for better communication among edge devices or between edge devices and the cloud. However, there is a lack of comprehensive communication performance comparison of integrating these networking middlewares with ROS\,2. In this study, we provide a quantitative analysis for the communication performance of utilized networking middlewares including MQTT and Zenoh alongside DDS in ROS\,2 among a multiple host system. For a complete and reliable comparison, we calculate the latency and throughput of these middlewares by sending distinct amounts and types of data through different network setups including Ethernet, Wi-Fi, and 4G. To further extend the evaluation to real-world application scenarios, we assess the drift error (the position changes) over time caused by these networking middlewares with the robot moving in an identical square-shaped path. Our results show that CycloneDDS performs better under Ethernet while Zenoh performs better under Wi-Fi and 4G. In the actual robot test, the robot moving trajectory drift error over time (96\,s) via Zenoh is the smallest. It is worth noting we have a discussion of the CPU utilization of these networking middlewares and the performance impact caused by enabling the security feature in ROS\,2 at the end of the paper.
Abstract:Keypoint detection and description play a pivotal role in various robotics and autonomous applications including visual odometry (VO), visual navigation, and Simultaneous localization and mapping (SLAM). While a myriad of keypoint detectors and descriptors have been extensively studied in conventional camera images, the effectiveness of these techniques in the context of LiDAR-generated images, i.e. reflectivity and ranges images, has not been assessed. These images have gained attention due to their resilience in adverse conditions such as rain or fog. Additionally, they contain significant textural information that supplements the geometric information provided by LiDAR point clouds in the point cloud registration phase, especially when reliant solely on LiDAR sensors. This addresses the challenge of drift encountered in LiDAR Odometry (LO) within geometrically identical scenarios or where not all the raw point cloud is informative and may even be misleading. This paper aims to analyze the applicability of conventional image key point extractors and descriptors on LiDAR-generated images via a comprehensive quantitative investigation. Moreover, we propose a novel approach to enhance the robustness and reliability of LO. After extracting key points, we proceed to downsample the point cloud, subsequently integrating it into the point cloud registration phase for the purpose of odometry estimation. Our experiment demonstrates that the proposed approach has comparable accuracy but reduced computational overhead, higher odometry publishing rate, and even superior performance in scenarios prone to drift by using the raw point cloud. This, in turn, lays a foundation for subsequent investigations into the integration of LiDAR-generated images with LO. Our code is available on GitHub: https://github.com/TIERS/ws-lidar-as-camera-odom.
Abstract:As multi-robot systems continue to advance and become integral to various applications, managing conflicts and ensuring secure access control are critical challenges that need to be addressed. Access control is essential in multi-robot systems to ensure secure and authorized interactions among robots, protect sensitive data, and prevent unauthorized access to resources. This paper presents a novel framework for customizable conflict resolution and attribute-based access control in multi-robot systems for ROS 2 leveraging the Hyperledger Fabric blockchain. We introduce an attribute-based access control (ABAC) Fabric-ROS 2 bridge to enable secure communication and control between users and robots. By defining conflict resolution policies based on task priorities, robot capabilities, and user-defined constraints, our framework offers a flexible way to resolve conflicts. Additionally, it incorporates attribute-based access control, granting access rights based on user and robot attributes. ABAC offers a modular approach to control access compared to existing access control approaches in ROS 2, such as SROS2. Through this framework, multi-robot systems can be managed efficiently, securely, and adaptably, ensuring controlled access to resources and managing conflicts. Our experimental evaluation shows that our framework marginally improves latency and throughput over exiting Fabric and ROS 2 integration solutions. At higher network load, it is the only solution to operate reliably without a diverging transaction commitment latency. We also demonstrate how conflicts arising from simultaneous control or a robot by two users are resolved in real-time and motion distortion is effectively eliminated.